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BAYESIAN LEARNING 
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OVERVIEW 


Bayes rule & turn this into a classifier 


= E.g. How to decide if a patient is ill or healthy, based on 
= A probabilistic model of the observed data 


= Prior knowledge 
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CLASSIFICATION PROBLEM 


» Training data: examples of the form (d,h(d)) 

— where d are the data objects to classify (inputs) 

— and h(d) are the correct class info for d, h(d)e{I,...K} 
e Goal: given d ew provide h(d 


n aw) 


Training Info: Desired (target) Output 


Inputs Supervised Outputs 


Learning 


Error = (target output - actual output) 
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A WORD ABOUT THE BAYESIAN FRAMEWORK 


e Allows us to combine observed data and prior knowledge 
e Provides practical learning algorithms 


e It is a generative (model based) approach, which offers a useful conceptual 
framework 


— This means that any kind of objects (e.g. time series, trees, etc.) can be 
classified, based on a probabilistic model specification 
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Understanding Bayes'rule 


BAYES’ RULE fae 
P(d | h)P(h) h = hypothesis 
ph | d ) —- Proof. Just rearrange : 
P(d) p(h|d)P(d) = P(d | h)P(h) 


P(d,h) = P(d,h) 
the same joint probability 


| | | on both sides 
Who is who in Bayes rule 
P(h): prior belief (probability of hypothesis h before seeing any data) 
P(d |h): likelihood (probability of the data if the hypothesis A is true) 
P(d)= > Pd |h)P(h): data evidence (marginal probability of the data) 
h 
Plh (dj): posterior (probability of hypothesis h after having seen the data d) 
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PROBABILITIES 


e Have two dice h, and h, 


e The probability of rolling an i given die h, is denoted 


P(i|h,). This is a conditional probability 


e Pick a die at random with probability P(h;), j=l or 2. The probability for picking die h; and rolling an i with it is called joint 
probability and is P(i,h,)=P(h,)P(i| hj). 


* For any events X and Y, P(X, Y)=P(X|Y)P(Y) 
e If we know P(X,Y), then the so-called marginal probability P(X) can be computed as P(X) = X P(X, Y) 
Y 


e Probabilities sum to |. Conditional probabilities sum to | provided that their conditions are the same. 
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DOES PATIENT HAVE CANCER OR NOT? 


A patient takes a lab test and the result comes back positive. It is known that the test returns a correct positive 


result in only 98% of the cases and a correct negative result in only 97% of the cases. Furthermore, only 0.008 of the 


entire population has this disease. 
|. What is the probability that this patient has cancer? 
2.What is the probability that he does not have cancer? 


3.What is the diagnosis? 
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hypothesis :'cancer' | 
| } hypothesis space H 
hypothesis2 :'—cancer' 
— data: "+ 


Peace Ds P(+ | cancer)P(cancer) D trees _ 


LL 
P(+ | cancer) = 0.98 
P(cancer) = 0.008 
P(+) = P(+ | cancer)P(cancer) + P(+]|-—cancer)P(-cancer) 


3.Diagnosis ?? 
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CHOOSING HYPOTHESES 
e Maximum Likelihood hypothesis: 


e Generally we want the most probable 
hypothesis given training data. This is 
the maximum a posteriori hypothesis: 


— Useful observation: it does not depend on 
the denominator P(d) 
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hy, = argmax P(d |h) 


heH 


hy jp = argmax P(h|d) 
heH 
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NOW WE COMPUTE THE DIAGNOSIS 


— To find the Maximum Likelihood hypothesis, we evaluate P(d|h) for the data d, which is the 
positive lab test and chose the hypothesis (diagnosis) that maximises it: 
P(+| cancer) =... 
P(+]|-cancer) =... 


=> DiIGenOSIS he = 


— To find the Maximum A Posteriori hypothesis, we evaluate P(d|h)P(h) for the data d, which is 
the positive lab test and chose the hypothesis (diagnosis) that maximises it. This is the same 
as choosing the hypotheses gives the higher posterior probability. 


P(+| cancer) P(CANCET) =... 
P(+|-cancer)P(-cancer) —............. 


=> Diagnosis z lip Bb ne 
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THE NAIVE BAYES CLASSIFIER 


e What can we do if our data d has several attributes? 


e Naive Bayes assumption: Attributes that describe data instances are conditionally independent given the classification hypothesis 
P(d|h)= P(a,,...,4, |h) = | | PCa, Ih) 
t 


— itis a simplifying assumption, obviously it may be violated in reality 


— in spite of that, it works well in practice 
e The Bayesian classifier that uses the Naive Bayes assumption and computes the MAP hypothesis is called Naive Bayes classifier 
* One of the most practical learning methods 
» Successful applications: 

— Medical Diagnosis 


— Text classification 


DR. HAIDER ALI Machine Learning 12 


EXAMPLE. ‘PLAY TENNIS’ DATA 


Day Outlook Temperature | Humidity Wind Play 
Tennis 


Sunny High No 
Sunny High No 


Overcast High Yes 
Rain High Yes 
Rain Normal Yes 
Rain Normal No 

Overcast Normal Yes 

Sunny High No 
Sunny Normal Yes 
Rain Normal Yes 
Sunny Normal Yes 

Overcast High Yes 


Overcast Normal Yes 


Rain High No 
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NAIVE BAYES SOLUTION 


Classify any new datum instance x=(a,,...a;) as: 


h 


Naive Bayes 


= arg max P(h)P(x | h) = arg max P(A] | P(a, |h) 


To do this based on training examples, we need to estimate the parameters from the training examples: 


— For each target value (hypothesis) h 
P(h) = estimate P(A) 
P(a, | h) = estimate P(a, | h) 
— For each attribute value a, of each datum instance 
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ME 
Based on the examples in the table, classify the following datum x: 


x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong) 


e That means: Play tennis or not? 
hy, = argmax P(h)P(x | h) = arg max Pn) | P(a, |h) 


he[ yes no] he yes,no] t 


= arg max P(h)P(Outlook = sunny | h)P(Temp = cool | h) P( Humidity = high | h)P(Wind = strong 


Tennis 
Day1 Sunny Hot High Weak No 
Day2 Sunny Hot High Strong No 
. Day3 Overcast Hot High Weak Yes 
. Working: Day4 Rain Mild High Weak Yes 
: Day5 Rain Cool Normal Weak Yes 
P(PlayTennis = yes) = 9/14 = 0.64 Day6 | Rain Cool Normal | Strong | No 
7 Day7 Overcast Cool Normal Strong Yes 
P(PlayTennis = no) =5/14=0.36 Days | Sunny Mild High Weak No 
P $ d _ PI T eo _ 3 / 9 _ 0 33 Day9 Sunny Cool Normal Weak Yes 
(Win ia strong | ay ENMUS = yes) A mn Day10 Rain Mild Normal Weak Yes 
. = = = Day11 Sunny Mild Normal Strong Yes 
P(Wind = strong | PlayTennis ~ no) on 3 / 5 = 0.60 Day12 | Overcast Mild High Strong Yes 
t Day13 | Overcast Hot Normal Weak Yes 
erc. Day14 Rain Mild High Strong No 
P(yes)P(sunny | yes)P(cool | yes) P(high | yes)P(strong | yes) = 0.0053 
P(no)P(sunny | no)P(cool | no)P(high | no)P(strong | no) = 0.0206 
=> answer : PlayTennis(x) = no 
15 
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Given all the previous patients that I have seen, below are their symptoms and diagnosis: 


<<<Z<Z<Z 
<Z<Z<<Z< 
<Z<Z<<<Z 


Y 
Y 
Y 
N 
N 
N 
N 
Y 


Do I believe that the patient with the following symptoms has a flu using Naive Bayes Algorithm? 


fy IN [Mia IN|? 
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REMEMBER 


e Bayes’ rule can be turned into a classifier 
e Maximum A Posteriori (MAP) hypothesis estimation incorporates prior knowledge; Max Likelihood doesn't 


e Naive Bayes Classifier is a simple but effective Bayesian classifier for vector data (i.e. data with several attributes) 
that assumes that attributes are independent given the class. 


e Bayesian classification is a generative approach to classification 
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THANK YOU 
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